DEEP LEARNING IN IMAGE RECOGNITION: A COMPARATIVE REVIEW OF ARCHITECTURES AND MODELS

Authors

  • Alladi Deekshith Sr. Software Engineer and Research Scientist Department of Machine Learning, USA

Keywords:

Deep Learning, Image Recognition, Convolutional Neural Networks (CNNs), Residual Networks (ResNets), Vision Transformers (ViTs), Transfer Learning, Performance Metrics, Data Augmentation, Regularization Techniques, Benchmark Datasets

Abstract

Deep learning has revolutionized image recognition, providing state-of-the-art performance across various applications, from medical diagnostics to autonomous vehicles. This comparative review explores the evolution of deep learning architectures and models used in image recognition. We categorize and analyze prominent architectures, including Convolutional Neural Networks (CNNs), Residual Networks (ResNets), Inception Networks, and more recent developments like Vision Transformers (ViTs). The review highlights key features, strengths, and limitations of each architecture while discussing their performance metrics in standard benchmark datasets such as ImageNet, CIFAR-10, and MNIST. Additionally, we examine the impact of transfer learning, data augmentation, and regularization techniques on model performance. By synthesizing current research, this review aims to provide insights into selecting appropriate architectures for specific image recognition tasks and identifies future research directions to enhance the capabilities of deep learning models in this domain.

Downloads

Download data is not yet available.

References

Alexey, K., & Vincent, Y. (2015). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. https://doi.org/10.1145/3065386

Chollet, F. (2017). Deep learning with Python. Manning Publications.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). https://doi.org/10.1109/CVPR.2016.90

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2261-2269). https://doi.org/10.1109/CVPR.2017.243

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).

LeCun, Y., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791

Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440). https://doi.org/10.1109/CVPR.2015.7298965

Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (Vol. 27, pp. 807-814).

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788). https://doi.org/10.1109/CVPR.2016.9

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations. https://arxiv.org/abs/1409.1556

Szegedy, C., Vanhoucke, V., Vinyals, O., & Google, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9). https://doi.org/10.1109/CVPR.2015.7298594

Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (Vol. 97, pp. 6105-6114). https://arxiv.org/abs/1905.11946

Vaswani, A., Shard, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Kaiser, Ł. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008). https://arxiv.org/abs/1706.03762

Weng, J., Cheng, Y., & Zhao, L. (2018). Deep learning for image classification: A comprehensive review. Journal of Computer Science and Technology, 33(4), 705-726. https://doi.org/10.1007/s11390-018-1824-2

Zhang, K., Zhang, Z., & Chen, Y. (2016). A survey on deep learning-based image recognition. Journal of Computer Science and Technology, 31(1), 85-108. https://doi.org/10.1007/s11390-016-1610-0

Zhang, Y., Song, L., & Wei, X. (2019). Transfer learning for image classification: A survey. IEEE Transactions on Neural Networks and Learning Systems, 30(5), 1357-1377. https://doi.org/10.1109/TNNLS.2018.2810981

Zhao, H., Shi, J., Qi, X., Wang, Z., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6230-6239). https://doi.org/10.1109/CVPR.2017.623

Zhou, K., Wang, H., & Zhao, X. (2019). A brief review of deep learning for image classification. Journal of Physics: Conference Series, 1396(1), 012023. https://doi.org/10.1088/1742-6596/1396/1/012023

Zhuang, F., et al. (2019). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43-76. https://doi.org/10.1109/JPROC.2020.2979930

Zhang, Y., & Xu, B. (2019). A comprehensive review on image recognition with deep learning. Neural Computing and Applications, 32(5), 1551-1563. https://doi.org/10.1007/s00500-018-3774-8

Downloads

Published

2023-12-16

How to Cite

[1]
Alladi Deekshith, “DEEP LEARNING IN IMAGE RECOGNITION: A COMPARATIVE REVIEW OF ARCHITECTURES AND MODELS”, IEJRD - International Multidisciplinary Journal, vol. 8, no. 6, p. 7, Dec. 2023.